384 research outputs found

    Hierarchical maximum likelihood clustering approach

    Get PDF
    Objective: In this work, we focused on developing a clustering approach for biological data. In many biological analyses, such as multi-omics data analysis and genome-wide association studies (GWAS) analysis, it is crucial to find groups of data belonging to subtypes of diseases or tumors. Methods: Conventionally, the k-means clustering algorithm is overwhelmingly applied in many areas including biological sciences. There are, however, several alternative clustering algorithms that can be applied, including support vector clustering. In this paper, taking into consideration the nature of biological data, we propose a maximum likelihood clustering scheme based on a hierarchical framework. Results: This method can perform clustering even when the data belonging to different groups overlap. It can also perform clustering when the number of samples is lower than the data dimensionality. Conclusion: The proposed scheme is free from selecting initial settings to begin the search process. In addition, it does not require the computation of the first and second derivative of likelihood functions, as is required by many other maximum likelihood based methods. Significance: This algorithm uses distribution and centroid information to cluster a sample and was applied to biological data. A Matlab implementation of this method can be downloaded from the web-link http://www.riken.jp/en/research/labs/ims/med_sci_math/

    Genome-wide SNP data of Izumo and Makurazaki populations support inner-dual structure model for origin of Yamato people

    Get PDF
    The “Dual Structure” model on the formation of the modern Japanese population assumes that the indigenous huntergathering population (symbolized as Jomon people) admixed with rice-farming population (symbolized as Yayoi people) who migrated from the Asian continent after the Yayoi period started. The Jomon component remained high both in Ainu and Okinawa people who mainly reside in northern and southern Japan, respectively, while the Yayoi component is higher in the mainland Japanese (Yamato people). The model has been well supported by genetic data, but the Yamato population was mostly represented by people from Tokyo area. We generated new genome-wide SNP data using Japonica Array for 45 individuals in Izumo City of Shimane Prefecture and for 72 individuals in Makurazaki City of Kagoshima Prefecture in Southern Kyushu, and compared these data with those of other human populations in East Asia, including BioBank Japan data. Using principal component analysis, phylogenetic network, and f4 tests, we found that Izumo, Makurazaki, and Tohoku populations are slightly differentiated from Kanto (including Tokyo), Tokai, and Kinki regions. These results suggest the substructure within Mainland Japanese maybe caused by multiple migration events from the Asian continent following the Jomon period, and we propose a modified version of “Dual Structure” model called the “Inner-Dual Structure” model

    Imputation of KIR Types from SNP Variation Data.

    Get PDF
    Large population studies of immune system genes are essential for characterizing their role in diseases, including autoimmune conditions. Of key interest are a group of genes encoding the killer cell immunoglobulin-like receptors (KIRs), which have known and hypothesized roles in autoimmune diseases, resistance to viruses, reproductive conditions, and cancer. These genes are highly polymorphic, which makes typing expensive and time consuming. Consequently, despite their importance, KIRs have been little studied in large cohorts. Statistical imputation methods developed for other complex loci (e.g., human leukocyte antigen [HLA]) on the basis of SNP data provide an inexpensive high-throughput alternative to direct laboratory typing of these loci and have enabled important findings and insights for many diseases. We present KIR∗IMP, a method for imputation of KIR copy number. We show that KIR∗IMP is highly accurate and thus allows the study of KIRs in large cohorts and enables detailed investigation of the role of KIRs in human disease.This work was supported by the Australian National Health and Medical Research Council (NHMRC), Career Development Fellowship ID 1053756 (S.L.); by a Victorian Life Sciences Computation Initiative (VLSCI) grant number VR0240 on its Peak Computing Facility at the University of Melbourne, an initiative of the Victorian Government, Australia (S.L.); by the UK Multiple Sclerosis Society, grant 894/08 (S.S.); and by the Wellcome Trust and the MRC with partial funding from the National Institute of Health Cambridge Biomedical Research Centre (J.T., J.A.T.). Research at the Murdoch Childrens Research Institute was supported by the Victorian Government's Operational Infrastructure Support Program.This is the final version of the article. It first appeared from Elsevier via http://dx.doi.org/10.1016/j.ajhg.2015.09.00

    Genome-wide association studies identify polygenic effects for completed suicide in the Japanese population

    Get PDF
    Suicide is a significant public health problem worldwide, and several Asian countries including Japan have relatively high suicide rates on a world scale. Twin, family, and adoption studies have suggested high heritability for suicide, but genetics lags behind due to difficulty in obtaining samples from individuals who died by suicide, especially in non-European populations. In this study, we carried out genome-wide association studies combining two independent datasets totaling 746 suicides and 14,049 non-suicide controls in the Japanese population. Although we identified no genome-wide significant single-nucleotide polymorphisms (SNPs), we demonstrated significant SNP-based heritability (35–48%; P < 0.001) for completed suicide by genomic restricted maximum-likelihood analysis and a shared genetic risk between two datasets (P best = 2.7 × 10−13) by polygenic risk score analysis. This study is the first genome-wide association study for suicidal behavior in an East Asian population, and our results provided the evidence of polygenic architecture underlying completed suicide

    Germline pathogenic variants of 11 breast cancer genes in 7,051 Japanese patients and 11,241 controls

    Get PDF
    Pathogenic variants in highly penetrant genes are useful for the diagnosis, therapy, and surveillance for hereditary breast cancer. Large-scale studies are needed to inform future testing and variant classification processes in Japanese. We performed a case-control association study for variants in coding regions of 11 hereditary breast cancer genes in 7051 unselected breast cancer patients and 11,241 female controls of Japanese ancestry. Here, we identify 244 germline pathogenic variants. Pathogenic variants are found in 5.7% of patients, ranging from 15% in women diagnosed <40 years to 3.2% in patients ≥80 years, with BRCA1/2, explaining two-thirds of pathogenic variants identified at all ages. BRCA1/2, PALB2, and TP53 are significant causative genes. Patients with pathogenic variants in BRCA1/2 or PTEN have significantly younger age at diagnosis. In conclusion, BRCA1/2, PALB2, and TP53 are the major hereditary breast cancer genes, irrespective of age at diagnosis, in Japanese women

    Association of Common Variants in TNFRSF13B, TNFSF13, and ANXA3 with Serum Levels of Non-Albumin Protein and Immunoglobulin Isotypes in Japanese

    Get PDF
    We performed a genome-wide association study (GWAS) on levels of serum total protein (TP), albumin (ALB), and non-albumin protein (NAP). We analyzed SNPs on autosomal chromosomes using data from 9,103 Japanese individuals, followed by a replication study of 1,600 additional individuals. We confirmed the previously- reported association of GCKR on chromosome 2p23.3 with serum ALB (rs1260326, Pmeta = 3.1×10−9), and additionally identified the significant genome-wide association of rs4985726 in TNFRSF13B on 17p11.2 with both TP and NAP (Pmeta = 1.2×10−14 and 7.1×10−24, respectively). For NAP, rs3803800 and rs11552708 in TNFSF13 on 17p13.1 (Pmeta = 7.2×10−15 and 7.5×10−10, respectively) as well as rs10007186 on 4q21.2 near ANXA3 (Pmeta = 1.3×10−9) also indicated significant associations. Interestingly, TNFRSF13B and TNFSF13 encode a tumor necrosis factor (TNF) receptor and its ligand, which together constitute an important receptor-ligand axis for B-cell homeostasis and immunoglobulin production. Furthermore, three SNPs, rs4985726, rs3803800, and rs11552708 in TNFRSF13B and TNFSF13, were indicated to be associated with serum levels of IgG (P<2.3×10−3) and IgM (P<0.018), while rs3803800 and rs11552708 were associated with IgA (P<0.013). Rs10007186 in 4q21.2 was associated with serum levels of IgA (P = 0.036), IgM (P = 0.019), and IgE (P = 4.9×10−4). Our results should add interesting knowledge about the regulation of major serum components

    Combined landscape of single-nucleotide variants and copy number alterations in clonal hematopoiesis

    Get PDF
    クローン性造血の臨床予後への影響を解明 --遺伝子変異とコピー数異常の統合的な知見--. 京都大学プレスリリース. 2021-07-09.Clonal hematopoiesis (CH) in apparently healthy individuals is implicated in the development of hematological malignancies (HM) and cardiovascular diseases. Previous studies of CH analyzed either single-nucleotide variants and indels (SNVs/indels) or copy number alterations (CNAs), but not both. Here, using a combination of targeted sequencing of 23 CH-related genes and array-based CNA detection of blood-derived DNA, we have delineated the landscape of CH-related SNVs/indels and CNAs in 11, 234 individuals without HM from the BioBank Japan cohort, including 672 individuals with subsequent HM development, and studied the effects of these somatic alterations on mortality from HM and cardiovascular disease, as well as on hematological and cardiovascular phenotypes. The total number of both types of CH-related lesions and their clone size positively correlated with blood count abnormalities and mortality from HM. CH-related SNVs/indels and CNAs exhibited statistically significant co-occurrence in the same individuals. In particular, co-occurrence of SNVs/indels and CNAs affecting DNMT3A, TET2, JAK2 and TP53 resulted in biallelic alterations of these genes and was associated with higher HM mortality. Co-occurrence of SNVs/indels and CNAs also modulated risks for cardiovascular mortality. These findings highlight the importance of detecting both SNVs/indels and CNAs in the evaluation of CH

    Polygenic burden in focal and generalized epilepsies.

    Get PDF
    Rare genetic variants can cause epilepsy, and genetic testing has been widely adopted for severe, paediatric-onset epilepsies. The phenotypic consequences of common genetic risk burden for epilepsies and their potential future clinical applications have not yet been determined. Using polygenic risk scores (PRS) from a European-ancestry genome-wide association study in generalized and focal epilepsy, we quantified common genetic burden in patients with generalized epilepsy (GE-PRS) or focal epilepsy (FE-PRS) from two independent non-Finnish European cohorts (Epi25 Consortium, n = 5705; Cleveland Clinic Epilepsy Center, n = 620; both compared to 20 435 controls). One Finnish-ancestry population isolate (Finnish-ancestry Epi25, n = 449; compared to 1559 controls), two European-ancestry biobanks (UK Biobank, n = 383 656; Vanderbilt biorepository, n = 49 494), and one Japanese-ancestry biobank (BioBank Japan, n = 168 680) were used for additional replications. Across 8386 patients with epilepsy and 622 212 population controls, we found and replicated significantly higher GE-PRS in patients with generalized epilepsy of European-ancestry compared to patients with focal epilepsy (Epi25: P = 1.64×10-15; Cleveland: P = 2.85×10-4; Finnish-ancestry Epi25: P = 1.80×10-4) or population controls (Epi25: P = 2.35×10-70; Cleveland: P = 1.43×10-7; Finnish-ancestry Epi25: P = 3.11×10-4; UK Biobank and Vanderbilt biorepository meta-analysis: P = 7.99×10-4). FE-PRS were significantly higher in patients with focal epilepsy compared to controls in the non-Finnish, non-biobank cohorts (Epi25: P = 5.74×10-19; Cleveland: P = 1.69×10-6). European ancestry-derived PRS did not predict generalized epilepsy or focal epilepsy in Japanese-ancestry individuals. Finally, we observed a significant 4.6-fold and a 4.5-fold enrichment of patients with generalized epilepsy compared to controls in the top 0.5% highest GE-PRS of the two non-Finnish European cohorts (Epi25: P = 2.60×10-15; Cleveland: P = 1.39×10-2). We conclude that common variant risk associated with epilepsy is significantly enriched in multiple cohorts of patients with epilepsy compared to controls-in particular for generalized epilepsy. As sample sizes and PRS accuracy continue to increase with further common variant discovery, PRS could complement established clinical biomarkers and augment genetic testing for patient classification, comorbidity research, and potentially targeted treatment
    corecore